Our perceptual system provides us
with an internal representation of the world, constructed after the sense data
are received through our sense accessories, where they are transduced by
receptors into electrical impulses in a data-reducing, neural coding process
culminating in the firing of action potentials. Debate about the development of
our perceptual mechanisms can be split into the nativist-empirist camps,
although most psychologists would now acknowledge at least some interaction
between biological and learnt processes; alternatively, top-down or bottom-up,
i.e. knowledge vs data-driven views of how we perceive objects, although again
some interplay between the two seems necessary and can be demonstrated by
�priming subjects�.
That visual perception is not an
easy task did not become apparent until similar feats of pattern recognition
were attempted with computers on images. Our visual system starts to shape the
raw data even as an image is formed on our retinas. Marr outlined four major stages in vision processing, the first
being a �grey-level description�, representing the intensity of light at each
point in the retinal image in order to discover regions in the image and their
boundaries.
A simpler example of delineating an
object�s boundaries might be the recognition of a square. The theoretical
explanation behind this involves a hierarchical feature net, where detectors
for the simpler elements trigger detectors for the more complex elements. In
this case, our feature detectors first recognise four straight lines of
perpendicular orientations, then four right-angles, all of which together
finally trigger the square detector which recognises the composite figure.
Before discussing what detection mechanisms
exist, I want to first consider the basis for neural coding of the
photoreceptors� input. The evolution of the eye can be seen as the culmination
so far of a string of tiny steps all the way from complete insensitivity to
light intensity. Like a camera, light streams in towards the retina, focused by
the lens, with the level controlled by a diaphragm (the iris) and into a
black-lined (the choroid coat, which forms the pupil at the front) area where
it stimulates photoreceptive cells (120 million achromatic higher-sensitivity
rods or 6 million trichromatic cones mainly found in the fovea).
These receptors are linked to the
ganglion cells by bi-polar cells (which by an anatomical curiosity are in
between the retina and the lens), with between 10 and 1000 receptors per
ganglial cell, depending on how central the receptor is. Each of these
groupings forms a �receptive field� (which was defined by Kuffler & Barlow (1953) as �that area of the retina such that
light (or a pattern of light) falling within that area affects the pattern of
firing�), with excitatory and inhibitory inputs being spatially summed.
Ernst Mach and Edward Hering
provided evidence in the 19th century of lateral inhibition with their Mach
bands:
Even though the brightness intensity
of the middle section varies uniformly, our eyes perceive stripes of a
particularly dark area of black and a particularly bright area of white on
either side. This is due to on-centre/off-surround receptive fields, whose
inhibitory rings overlap and are laterally inhibited by their neighbours,
causing the spike rate to be particularly low on the dark edge and particularly
high on the bright edge. By coding for change (in the case of lateral
inhibition, spatial, but also temporal) the amount of data required to describe
a scene is drastically reduced. Selective coding of information about pattern
or the changes of light and dark at the retina (Ratliff & Hartine, Barlow)
is an example of redundancy reduction.
Huber & Wiesel (1959)
identified three types of cells, which respond to only very low-level, basic
features, such as lines of differing orientations, lengths and directions:
1.
simple cells |
orientation-tuned separate excitatory and inhibitory regions |
2.
complex cells |
orientation-tuned often
directional-selective no separate excitatory or inhibitory regions |
3.
hyper-complex cells |
orientation-tuned often
directional-selective no separate
excitatory or inhibitory regions selective for length (either one or
both ends) |
However, these on their own are not
feature-detectors, since although they are selective to certain lines/bars, the
responses of single cells are necessarily ambiguous, and modulated by a variety
of different stimulus dimensions (e.g. length, position, luminance, contrast,
wavelength, motion). Moreover, information about orientation is distributed
coded, i.e. preserved in the pattern of firing across many cortical cells,
rather than any single one.
Lettvin et
al (1959) first provided evidence
of a frog�s �bug detector�, certain ganglion cells in a frog�s optic nerve
which respond intensely to a small, dark object which moves around within a
certain retinal region. The obvious evolutionary explanation is of the frog�s
need to rapidly identify insects flying about within range. However, the level
of visual analysis varies, with higher animals such cats and primates having
relatively simple retinal detectors and far more work being carried out in the
visual cortex.
Perception is not simply a matter of
reproducing a 2-dimensional image like a photograph, but rather extracting from
it to form an internal representation of its 3-dimensional content. In humans,
this appears to come about through a multi-layered system of
redundancy-reduction at the retinal level, then simple and complex cells in the
visual cortex, whose rate of firing in conjunction with each other builds up an
increasingly sophisticated coding of the inputs contours, shapes and motions;
this information is then processed with reference to memory to develop
recognised patterns into objects and then scenes.